10 research outputs found
Recommended from our members
Artificial intelligence system for continuous affect estimation from naturalistic human expressions
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe analysis and automatic affect estimation system from human expression has been acknowledged as an active research topic in computer vision community. Most reported affect recognition systems, however, only consider subjects performing well-defined acted expression, in a very controlled condition, so they are not robust enough for real-life recognition tasks with subject variation, acoustic surrounding and illumination change. In this thesis, an artificial intelligence system is proposed to continuously (represented along a continuum e.g., from -1 to +1) estimate affect behaviour in terms of latent dimensions (e.g., arousal and valence) from naturalistic human expressions. To tackle the issues, feature representation and machine
learning strategies are addressed. In feature representation, human expression is represented by modalities such as audio, video, physiological signal and text modality. Hand- crafted features is extracted from each modality per frame, in order to match with consecutive affect label. However, the features extracted maybe missing information due to several factors such as background noise or lighting condition. Haar Wavelet Transform is employed to determine if noise cancellation mechanism in feature space should be considered in the design of affect estimation system. Other than hand-crafted features, deep learning features are also analysed in terms of the layer-wise; convolutional and fully connected layer. Convolutional Neural Network
such as AlexNet, VGGFace and ResNet has been selected as deep learning architecture to do feature extraction on top of facial expression images. Then, multimodal fusion scheme is applied by fusing deep learning feature and hand-crafted feature together to improve the performance. In machine learning strategies, two-stage regression approach is introduced. In the first stage, baseline regression methods such as Support Vector Regression are applied to estimate each affect per time. Then in the second stage, subsequent model such as Time Delay Neural Network, Long Short-Term Memory and Kalman Filter is proposed to model the
temporal relationships between consecutive estimation of each affect. In doing so, the temporal information employed by a subsequent model is not biased by high variability present in consecutive frame and at the same time, it allows the network to exploit the slow changing dynamic between emotional dynamic more efficiently. Following of two-stage regression approach for unimodal affect analysis, fusion information from different modalities is elaborated. Continuous emotion recognition in-the-wild is leveraged by investigating mathematical modelling for each emotion dimension. Linear Regression, Exponent Weighted Decision Fusion and Multi-Gene Genetic Programming are implemented to quantify the relationship between each modality. In summary, the research work presented in this thesis reveals a fundamental approach to automatically estimate affect value continuously from naturalistic human expression. The proposed system, which consists of feature smoothing, deep learning feature, two-stage regression framework and fusion using mathematical equation between modalities is demonstrated. It offers strong basis towards the development artificial intelligent system on estimation continuous affect estimation, and more broadly towards building a real-time
emotion recognition system for human-computer interaction.Majlis Amanah Rakyat (MARA), Malaysi
Automatic depression scale prediction using facial expression dynamics and regression
Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behaviour, feelings and sense of well-being. In such a low mood, both the facial expression and voice appear different from the ones in normal states. In this paper, an automatic system is proposed to predict the scales of Beck Depression Inventory from naturalistic facial expression of the patients with depression. Firstly, features are extracted from corresponding video and audio signals to represent characteristics of facial and vocal expression under depression. Secondly, dynamic features generation method is proposed in the extracted video feature space based on the idea of Motion History Histogram (MHH) for 2-D video motion extraction. Thirdly, Partial Least Squares (PLS) and Linear regression are applied to learn the relationship between the dynamic features and depression scales using training data, and then to predict the depression scale for unseen ones. Finally, decision level fusion was done for combining predictions from both video and audio modalities. The proposed approach is evaluated on the AVEC2014 dataset and the experimental results demonstrate its effectiveness.The work by Asim Jan was supported by School of Engineering & Design/Thomas Gerald Gray PGR Scholarship. The work by Hongying Meng and Saeed Turabzadeh was partially funded by the award of the Brunel Research Initiative and Enterprise Fund (BRIEF). The work by Yona Falinie Binti Abd Gaus was supported by Majlis Amanah Rakyat (MARA) Scholarship
Hidden Markov Model -based Gesture Recognition with Overlapping Hand- Head/Hand-Hand Estimated using Kalman Filter
Abstract---In this paper, we introduce a hand gesture recognition system to recognize isolated Malaysian Sign Language (MSL). The system consists of four modules: collection of input images, feature extraction, Hidden Markov Model (HMM) training, and gesture recognition. First, we apply skin segmentation procedure throughout the input frames in order to detect only skin region. Then, we proceed to feature extraction process consisting of centroids, hand distance and hand orientation collecting. Kalman Filter is used to identify the overlapping hand-head or hand-hand region. After having extracted the feature vector, the hand gesture trajectory is represented by gesture path in order to reduce system complexity. We apply Hidden Markov Model (HMM) to recognize the input gesture. The gesture to be recognized is separately scored against different states of HMMs. The model with the highest score indicates the corresponding gesture. In the experiments, we have tested our system to recognize 112 MSL, and the recognition rate is about 83%
Hidden Markov Model-Based Gesture Recognition with Overlapping Hand-Head/Hand-Hand Estimated Using Kalman Filter
In this paper, we introduce a hand gesture recognition system to recognize isolated Malaysian Sign Language (MSL). The system consists of four modules: collection of input images, feature extraction, Hidden Markov Model (HMM) training, and gesture recognition. First, we apply skin segmentation procedure throughout the input frames in order to detect only skin region. Then, we proceed to feature extraction process consisting of centroids, hand distance and hand orientation collecting. Kalman Filter is used to identify the overlapping hand-head or hand-hand region. After having extracted the feature vector, the hand gesture trajectory is represented by gesture path in order to reduce system complexity. We apply Hidden Markov Model (HMM) to recognize the input gesture. The gesture to be recognized is separately scored against different states of HMMs. The model with the highest score indicates the corresponding gesture. In the experiments, we have tested our system to recognize 112 MSL, and the recognition rate is about 83%
Electrical characterization and source-drain voltage dependent mobility of P-channel organic field-effect transistors using MATLAB simulation
We demonstrate fabrication of bottom gate/top source-drain contacts for p-channel (small molecule) organic field-effect transistor (OFET) using pentacene as an active semiconductor layer and silicon dioxide (SiO2) as gate dielectric. The device exhibits a typical output curve of a field-effect transistor (FET). Furthermore, analysis of electrical characterization was done to investigate the source-drain voltage (Vds) dependent mobility. The mobility which calculated using MATLAB simulation exhibited a range from 0.0234 to 0.0258 cm2/Vs with increasing source-drain voltage (average mobility was 0.0254 cm2/Vs). This work suggests that the mobility increase with increasing source-drain voltage similar to the gate voltage dependent mobility phenomenon
Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks cannot achieve this unless they explicitly define both a domain-invariant and geometric-invariant joint latent distribution, whereas NeuralRadiance Fields are generally unable to handle both issues as they optimize at the pixel level. By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. Operating in an information-rich code space enables highresolution 3D synthesis via full-coverage attention across the views. Specifically, we generate the 3D codes, e.g. for CT images, conditional on previously generated 3D codes and the entire codebook of two 2D views (e.g. 2D X-rays). Qualitative and quantitative results demonstrate state-of-the-art performance over specialized methods across varied evaluation criteria, including fidelity metrics such as density and coverage and distortion metrics for two datasets of complex volumetric imagery found in real-world scenarios